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Abstract 

In recent years, the theory and application of complex networks has been quickly 
developing in a markable way due to the increasing amount of data from real systems 
and to the fruitful application of powerful methods used in statistical physics. Many 
important characteristics of social or biological systems can be described by the 
study of their underlying structure of interactions. Hierarchy is one of these features 
that can be formulated in the language of networks. In this paper we present 
the analytic results on the hierarchical properties of random network models with 
zero correlations and also investigate the effects of different type of correlations. 
The behavior of hierarchy is different in the absence and the presence of the giant 
components. We show that the hierarchical structure can be drastically different 
if there are one-point correlations in the network. We also show numerical results 
suggesting that hierarchy does not change monotonously with the correlations and 
there is an optimal level of non-zero correlations maximizing the level of hierarchy. 
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1 Introduction 

The application of complex networks in a broad range of social and biological systems 
has been the subject of much interest recently [IH5]. These applications involve - among 
others - the description of "small-world" property of the networks [6] or the consequences 
of scale-free degree distributions [7J. Many aspects of the real systems can be studied 
in the framework of undirected networks [8HTT]. The most important property of the 
network is the degree distribution pk which is the probability of a randomly chosen node 
having k edges [7J. Other features of the network can be understood through the degree 



* Corresponding author: enysOhal . elte . hu 



distribution (average length of the shortest path between nodes, average number of edges 
between a node's neighbors [12], the characteristics of epidemics on the network [10] or 
robustness against failures and attacks [13]). 

However, most of the real networks are directed, i.e., the connection between two units 
of the system is not symmetric. Many structural properties of a directed network can be 
derived from undirected networks in a straightforward way [8], but the appearance of 
directionality also opens the door to features that are essentially different from those in 
undirected graphs (Fig. [I]). In the presence of directed edges, the organization level of 
the nodes on a large scale can be very complex and flow hierarchy can emerge. It is a 
global structure of the network that is the result of the different roles of the nodes (see 
Fig. [2] for the comparison of different hierarchy types). As the growing number of findings 
show, hierarchy is a frequently appearing property of real networks, especially of networks 
describing social interactions [171121] . The concept of hierarchy has led to different defi- 
nitions (Fig. H]) and also to algorithms for measuring it in both directed and undirected 
networks [2"2"H2"T] . In this paper we investigate flow hierarchy by a corresponding measure, 
the global reaching centrality (GRC), that has the intention to quantify that [28]. In the 
following sections we show that the behavior of the GRC is different below and above the 
critical average degree k c , i.e., in the absence and presence of the giant components. We 
also give an approximation to its dependence on the average degree for different random 
network models when there are no degree correlations. In the last section we study the 
effects of degree correlations. 
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(a) Undirected graph 



(b) Directed graph 
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Figure 1: The difference between undirected (a) and directed (b) graphs. On large scale, 
the structure of the undirected graph (c) can be described by the largest connected com- 
ponent (the giant component or GC) and the small components (SC) [H]. In the case 
of a directed graph (d), the giant components are best summarized in the "bow-tie" 
diagram [T5l[16] : the primary core is the giant strongly connected component (GSCC). 
Inside the GSCC, every node can reach every other. There are nodes that can reach the 
whole GSCC but not vice verse (IN) and together with the GSCC they form the giant in 
component (GIN). The similar is true for the giant out component (GOUT) but in the 
reversed direction. There are nodes that connect the GIN and GOUT components but 
are not in the GSCC, they form the tubes. There are also tendrils that attach to the GIN 
and GOUT components. The rest of the nodes are in the small disconnected components 
(DC). 
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(c) Flow hierarchy 

Figure 2: The three different types of hierarchy, (a) Order hierarchy is simply a rank 
assigned to each unit making them an ordered set. (b) Nested hierarchy is the hierarchy 
of the nested clusters of nodes, (c) In a (complete) flow hierarchy, nodes can be layered 
in different levels so that the nodes that are influenced by other nodes (via an out-edge) 
are at lower levels. As the figures illustrate, each type can be transformed into a flow 
hierarchy. In the order hierarchy one can introduce a directed edge between every pair of 
adjacent nodes in the hierarchy. In the nested hierarchy one can assign a virtual node to 
each cluster and link a cluster to its contained clusters. 



2 Reaching centralities in uncorrelated random net- 
works 



2.1 Local and global reaching centrality 

Given a directed graph G(V,E) with N = \V\ vertices and M = \E\ edges, the local 
reaching centrality of node % is defined as the number of reachable nodes via out-degrees 
divided by the total number of nodes: 

CflW = n=i = —i (1) 

where Si = {j G V|0 < d out (i,j) < 00} is the set of nodes that has finite, non-zero out- 
distance from node i. We will denote the size of the reachable set (i.e., the local reaching 
centrality without normalization) by Cr. The global reaching centrality of the graph is 
the normalized sum of the distances from the maximum local reaching centrality: 



GRC = — 

N 



The normalization factor is the maximum possible value of the sum in a graph with N 
nodes (this can be achieved in a star graph). The definition of the GRC can be written 
in a more expressive form: 
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GRC 
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CT X - (Cr) (3) 



Hence, the GRC is proportional to the difference between the average and the maximum 
size of the reachable sets. 



2.2 Generating function method in directed networks 

The calculation of the reachable set of a node is equivalent to the problem of finding the 
out- component, which is the union of the reachable set and the node itself. The out- 
component can be determined by the generalization of the generating function formalism 
developed by Newman et al. [H1[T2] to directed networks. Assuming that our graph has 
joint degree distribution p^, that is the probability of a randomly chosen node having i 
in-degree and j out-degree, the corresponding double generating function can be defined 
as 

00 

9oo(x,y) = ^PijX l y 3 (4) 

i,j=0 

and the generating functions for the excess in- and out-degree distributions [8]: 

9w{x, y) = -Trrd x goo(x, y) (5) 



9oi{x,y) = -jj-rd y g 00 {x,y) 



(6) 



Let n° ut denote the probability that a randomly chosen node has out-component of size s 
and p° ut the probability that a randomly chosen edge points to an out-component of size 
s. Their generating functions are: 



ho(y) = 



(7) 
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hi(y) 



(8) 



If we assume that the graph is locally tree-like (i.e., loops are infrequent), they satisfy the 
following equations [8]: 

ho(y) = yg o[l,hi(y)) (9) 
h 1 (y) = yg 10 [l,h 1 (y)} (10) 



Using these equations for the generating functions, it is possible to derive a closed formula 
for the out- components in directed graphs (for the details see Appendix A): 
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Thus, if we know the joint degree distribution, we can calculate P{Cr) (and thus the 
distribution of the local reaching centralities as well, since it differs from P(Cr) only by 
a scale factor). In order to calculate P(Cr), we have to determine the out-components 
and apply the s = Cr + 1 substitution. With the knowledge on P(Cr), we are able to 
calculate the GRC at different average degrees. 

Before applying these equations to random networks with different degree distribu- 
tions, we determine the GRC for the case of a hierarchical tree. 



2.3 Hierarchical tree 



2.3.1 Local reaching centralities 

In a tree graph with N vertices and branching number of d, nodes in the same level have 
the same local reaching centrality. Let us denote the size of reachable set of a node in the 
£-ih level by . The number of nodes and the size of the reachable set in the £-ih level 
are the following (see Fig. [3]): 
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The probability that a randomly chosen node has a reachable set of size Cf is ke/N. 
Substituting Eq. flTJ) in Eq. flX3J gives: 
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N(d - l)Cg } + Nd 
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limit, the allowed discrete values of C^/(N 
becomes continuous) : 

P(cr) 



In the asymptotic limit of N — > oo, we get the following approximation for the probability 
of the normalized local reaching centrality (we also omit the i index, since in the mentioned 

1) are becoming dense in [0; 1] and 

(15) 



1 



Nc R 



d 
d-l 



Comparison with numerical results is plotted in Fig. HJ In the simulations, the number 
of nodes were 10 4 and thus Eq. f JT2|) and f fl3|) do not hold for every level. The number of 
nodes in the lowermost level is smaller than d which causes a smaller relative frequency 
for small local reaching centralities. An other effect is that the nodes below the topmost 
node have smaller reachable sets, shifting the corresponding points to the left (to smaller 
reaching centralities). The appearance of individual nodes with cr values other than the 
allowed values (as in Eq. ([13]) ) is also the artifact of the number of nodes. Besides these 
differences, the theoretical distribution is qualitatively descriptive. 
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Figure 3: The number of nodes and size of reachable sets in the different levels. 
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Figure 4: Theoretical and measured distribution of the local reaching centrality in the 
hierarchical tree. The dots are the results from graphs with N = 10 4 . In these trees, 
the number of nodes in the last level is not equal to the corresponding power of the 
branching number, thus causing a difference for very small and large values for cr and 
the emergence of specific single values on the bottom. The theoretical curve has the form 
of f(x) 
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2.3.2 Global reaching centrality 



Using the equations for the number of nodes and size of reachable sets in the £-th level, 
we can easily calculate the size of the reachable sets: 



£=0 1=0 V V ;/ 

where L denotes the number of levels (the level of the root is zero). This number can be 
determined by the constraint that the sum of the nodes in all levels gives the number of 
nodes: 

Arranging for L and simplifying it: 

_ ]n[N(d - 1) + 1] 
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Calculating the sum in the right hand side of Eq. ffl6l) gives: 



^ = L ^ + W^))-(^T) < 19) 



Substituting L into the last equation and using Eq. (j2D we get: 
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In the asymptotic case of A^ — > oo: 

grc=i + HW-V + i] (21) 

N(d-l) N\nd V ; 

In Fig. |5] we show the comparison of this result with the numerical simulations. We used 
only the logarithmic term in the plot. The deviation of the numerical results from the 
theoretical curve can be originated in the smaller number of nodes in the lowermost level 
and the fact that Eq. (fTTj) for the number of levels does not hold exactly. 
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Figure 5: Comparison of the measured GRC with the theoretical prediction. The dots 
are the GRCs of trees with iV = 10 5 nodes. The curve is a fit of the function f(x) = 
1 — Mgi£zll±j] w ith a = 118926. This difference is the result of the fact that in the 
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numerical calculations, the number of all nodes is not a sum of a geometrical series as 
assumed in Eq. fTTTj) . 



2.4 Erdos— Renyi graph 



2.4.1 Local reaching centralities 



In the case of uncorrelated in- and out-degrees, the joint degree distribution of an Erdos- 
Renyi (ER) graph [TJ1[29] is the product of two independent Poisson distributions: 
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where (k) denotes the average degree. The double generating function has the form: 

g 00 (x,y) = e^ + ^ (23) 

which is also the generating function of the excess degree distributions in this case. Now, 
using Eq. (TTTj) we get: 
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And for the size of reachable set: 



P(Cr) 
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Fig. [6] shows this result compared with the numerical distributions. It is important to 
understand the limits of Eq. (125j) . It is only the distribution for the size of the small reach- 
able sets, i.e., it does not contain the gsccH. This can be seen on the plots. Before the 
transition, all of the nodes are in separated small components and the reaching centrality 
distribution vanishes very quickly. Above the transition point, they start to aggregate in 
the GSCC. Note that in this regime, where the giant component appears, a large amount 
of nodes have a large reachable set. More precisely, the nodes in the GIN component can 
reach every node in GOUT. The distribution of the local reaching centralities becomes 
more and more a delta function at cr ~ 1. 



however, the integrated size of the giant components is encoded in the distribution, since it is 
normalized to N — |GWCC|, where GWCC is the giant weakly connected component. 
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Figure 6: Distribution of the reachable set sizes for the ER graph, based on the assumption 
that the graph is locally tree-like. Below the transition point, there is no node that can 
reach any finite part of the graph. This changes when the GSCC appears at (k) = 1. 
The numerical results are the averages of 1000 independent calculations on networks with 
N = 10 4 . Note that Eq. ( 1251) describes the distribution of the small components, thus the 
emerging GSCC (the peak at large reachable sets) is out of its scope. 



2.4.2 Global reaching centrality 

We have to distinguish between the graph without the GSCC and with the GSCC [Hj. 
Below the transition point k c , there are only small components. All of the nodes can reach 
only infinitesimal part in the graph, thus having an average reachable set size of order 
unit ((Cr) = 0(1)). In this regime, the GRC is dominated by the maximum value of 
the local reaching centrality. Since most of the nodes can reach very few other nodes and 
the distribution vanishes quickly, we can assume that the largest reachable set belongs 
to only one node (and that the corresponding out-component has the smallest relative 
frequency). Given a graph with N nodes, this condition translates as P(Cr) ~ 1/N for 
Cr = C^ ax . Thus, finding the reachable set size that has only one realization can lead us 
to find the largest component. In the C^ ax 3> 1 limit we can use the Stirling-formula to 
approximate P(Cr): 
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Writing the P{C R ) R 
exponential equation: 
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1/N condition and rearranging it for Cr we get the following 
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This equation can be solved in terms of the Lambert W function [30]. The equation of 
the form 

A(x -R) = e~ Bx (28) 



has the solution of x = R + j^W 



Be- 



A 



. Now we have 
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B = -(ln(k) + 1 - (k)) 
R=-l 

so the final expression for the largest local reaching centrality: 
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Now we turn our attention to the case when the GSCC is already present. In this case, 
for a good approximation, we can ignore those nodes that are not in the bow-tie (they are 



relevant only in the average local reaching centrality, but they have only an infinitesimal 
contribution). Thinking of the "bow-tie" picture, we can assume that there are some 
nodes that can reach the whole bow-tie. Thus, C% ax « |GIN| + |GOUT| - |GSCC|. The 
average is slightly different and nontrivial, but let assume that it is equivalent to the size 
of the giant out-component GOUT. If we assume that most of the nodes are gathering in 
the GSCC, it is also reasonable that the average size of reachable sets is dominated by 
the nodes in the GSCC and they can reach the whole GOUT component (see Fig. [TH). 
Using these assumptions, the size of the reachable sets is approximately |GOUT|. The 
relative sizes of these components are quite the same as the local reaching centrality. In 
the generating function formalism they are given by: 

I GSCC I 

— — = S = 1 - g 00 (u, 1) - 000 (1, v) + g 00 (u, v) (33) 
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r=l-£ooM) (34) 

N O = l-g 00 (l,v) (35) 
where u and v are the smallest non-zero solutions of the following equations [8]: 

u = g 01 (u,l) (36) 

v = g 10 (l,v) (37) 

Since the generating function is symmetrical in its variables, we have u = v. Using 
Eq. ( 123]) . the solution can be written in terms of the Lambert W function giving: 

««*» = -^- } W(-(k)e-^) (38) 

And for the GRC above the phase transition: 

GRC = 1-3= g 00 [u((k)), 1] - g 00 [u((k)),u((k))} (39) 

The theoretical curve and the measurements are shown in Fig. [7J The theoretical curve in 
the k c < (k) < 2 range is a little below the numerical results. This is in good accordance 
with the assumptions used in deriving Eq. (139]) : the average reachable set is approximated 
by the GOUT component, however at small average degrees, there are many small out- 
components that decrease the average. The relative sizes of the giant components are 
depicted on Fig. [BJ It is clearly seen that near the critical average degree, an appreciable 
portion of the nodes is outside of the giant components and they all have very small 
reachable set. Thus, the difference between the largest and average reaching centrality 
is larger in the real network, resulting in a larger GRC. The theoretical curve below k c 
predicts lower GRC, which indicates that the distribution of P{Cr) described by Eq. ( 123]) 
is no more valid for the whole network when approaching the critical average degree. 
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Figure 7: Global reaching centrality in the ER graph. The dots are the average of 1000 
independent calculations on networks with N = 10 4 . The errors are comparable to the 
size of the dots. The green and blue curves show the different approximations in the 
two regimes. Both approximations tend to deviate from the numerical values near the 
transition point. 
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Figure 8: The relative sizes of the giant components of the bow-tie diagram in the ER 
network (Fig. [TH). Although GOUT grows quickly with the average degree, at low edge 
densities ((A;) < 2) both component contain less than 80% of the nodes. 



2.5 Exponential network 
2.5.1 Local reaching centralities 

In this section we calculate P(Cr) and the GRC for the exponential network, motivated 
by the finding that the distribution of many real world networks can be well fitted by an 
exponential [31]. An uncorrelated exponential network has joint degree distribution of 
the form 

Pi . = (1 - e -V*)2 e -(*+j)/« (40) 

where the average degree can be obtained from the n parameter: (k) = (e 1 ^ — l) -1 . The 
double generating function and its derivatives are: 
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901 ^ V) = W ieV-xW'-y)* (43) 
By the formula for the out- components we have: 

^ = ! (2s-2)\ ^ (44) 

Substituting e 1//K = ^t 1 we get: 

(2s -2)! / (k) 



(ky( s -i)\s\{(k) + i) (45) 

Translating this to the reachable set (see Fig. |9] for comparison with numerical calcula- 
tions): 

P(r \ ( 2C *y ( ^ \ 2Cr+1 (ar\ 
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Figure 9: Distribution of the reachable set sizes in the exponential graph. Numerical 
results are the average of 1000 measurement on networks with N = 10 4 . The emergence 
of the giant components (which are not included by the analytical distribution) can be 
clearly seen at (k) = 1. 



2.5.2 Global reaching centrality 

The first task is to find the smallest solution of the u = goi(u,l) and v = gio(l,v) 
equations. Since we assumed uncorrelated exponential distributions, we have u = v. The 
equation 

U = 901{U > 1)= (k)(e^- U ) (47) 

is quadratic in u and has two solutions: ui = 1 and Ui = l/(k), thus for (k) < 1 there are 
no giant components and we can use the "rarest component" assumption as before with 
the ER graph. For this to do, we have to solve the following equation for Cr. 

( 2C rV- ( (*> \ 2Cr+1 _ i (4g) 



(k) c ^C R \(C R +l)\ \{k) + lj N 

Using the Stirling-formula (without going into the details), this equation can be approxi- 
mated by the following: 
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We assume that the rarest component is much larger than one, thus (%±i) Cr w e. This 
reduces our equations and we can rewrite it: 

AC R = e~ BCR (50) 

Where we used the following shorthand notations: 

A= (M+p£f (51) 

And neglected the additional 1 in the left hand side. The solution of Eq. (I49p is then: 



C R aX = ^ W {j) ( 53 ) 

If (k) > 1, this approximation fails because of the appearance of the giant components. 
Using the solution of Eq. (14 7j) . the relative sizes of the parts in the bow-tie diagram are: 
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By the same argument as with the ER graph, we get for the GRC: 
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This result is shown on Fig. [TO] along with the numerical results. The same argument can 
be applied for the exponential network as in the previous section. The predicted curves 
fit well in the very small and in the large average degree regimes. 
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Figure 10: Global reaching centrality of the exponential graph and the predicted curves 
in the different regimes of (k). Every numerical point is the average of 1000 independent 
runs on networks with N = 10 4 . The magnitude of the errors is in the order of the dot 
sizes. 



2.6 Scale- free network 

Without exponential cutoff, the probability distribution and double generating function 
of a scale-free network [321133] are the following: 

(59) 

and the generating function of the excess degree distributions: 
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Newman showed in [8] that the condition of the existence of the giant components is 

d x d y g 00 (x,y)\ X! y =1 > (k) (62) 
For the scale-free network, this equation reads as 
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Now, if make use of the relation between the exponent and the average degree: (A;) 



, and substitute x = 1, y = 1 we get: 

(kf > (k) 



or equivalently 

(k) > 1 (65) 

So, there is a giant component if the average degree is larger than one. But if we look at 
the function 

we can conclude that this condition gives giant components for any 7 > 2. Numerical 
simulations show that this is not the case (see Fig. [TT]) . 
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Figure 11: Distribution of the reachable set sizes in the scale- free graph. The data points 
are the average of 1000 distributions on networks with N = 10 4 . The giant components 
emerge at 7 = 2 and they vanish above this threshold. 



If we substitute goo(x,y) and its derivatives in Eq. (j3J) and Eq. (!36l) - (!37j) we observe 
that there are giant components of unit size for every exponent larger than two, since the 
formula 
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(67) 



gives exactly zero for any s and the equation 



u = g 01 (u, 1) 
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has always two solutions: «i = and Vq = 1. These suggest that the generating function 
formalism can be barely applied to the directed scale-free networks. The limits of the 
formalism is well-known for undirected networks as well [9J. However, it is possible to 
give a rather qualitative approximation for the GRC in the 7 > 2 regime. We use our 
observation from the simulations that there is no GSCC, and that in scale-free networks, 
very large degrees can appear. Since very large amount of the nodes have few out-degrees, 
the GRC is obviously dominated by C^ ax . Let us assume that the network breaks down 
into small components whose are the neighborhoods of the nodes with large out-degrees 
(i.e., every component gathers around a hub). In this case, the largest reachable set is 
the largest out-degree. In a scale-free network with degree distribution of pk oc A; -7 , the 
largest degree is well approximated by k max m N^- 1 [M] . Using this approximation for 
the out-degrees and not taking into account the in-degrees we get: 



Comparison of the real GRC and this approximation is shown in Fig. [T2j For large 
exponents, Eq. (I6"9"j) fits well to the numerical results. Below 7 = 3, it becomes less 
accurate. Note that the lower the exponent is the more portion of the nodes has large 
degrees. The predicted GRC becomes larger than the real value at some point (7 ~ 
2.3), that is the number of large hubs increases which results in larger average reaching 
centrality as well. 



GRC ~ N~ 



(69) 
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Figure 12: Global reaching centrality of the scale-free graph. Numerical results are the 
average of 1000 independent calculations on networks with N = 10 4 . Errors have a 
magnitude of the dot sizes. The largest degree approximation fits well above 7 > 3 and 
gives a lower value below it. 



3 Effects of correlations 



3.1 Degree-correlations 

In the above sections we assumed that the joint probability distribution of the in- and out- 
degrees is simply the product of two independent distributions, i.e., there are no degree- 
correlations. However, it is known that there are such correlations in real networks [35] . 
It is also known that these correlations have markable effects on the different properties 
of networks (percolation thresholds, epidemic thresholds, etc) j36|,|37]. In this paper, 
we study the effect of two types of correlations on the GRC: one-point correlations and 
directed assortative mixing. One-point correlation is the Pearson correlation between the 
in- and out- degree of a node: 

(kink-out) V {kin) V {kout) V (7n\ 

P cr v (k in )a v (k out ) 



The brackets denote averages over the nodes and oy(/c) = \J {k 2 )y — {k)y is the standard 
deviation of the corresponding degrees. Somewhat similarly, one can define the Pearson 
correlation between the out-degree of the start of an edge and the in-degree of its end 
term |35j : 

{jinkout)E ~ \jin) E (kout) E ftr -,\ 
VE(3in)VE(k out ) 

Here ji n is the in-degree of the node an edge points to and k out is the out-degree of the node 
that edge comes from. The averages run over the edges. We will refer to this quantity as 
two-point correlation or directed assortativity, since it is a plausible generalization of the 
assortativity in undirected networks. Table [1] shows the two correlations in several real 
networks. 
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(fc) 
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Food networks 








Ythan 


4.452 


0.168 


-0.249 


LittleRock [38] 


13.628 


-0.138 


-0.394 


Grassland [2] 


1.557 


-0.179 


-0.233 


Electric 








sl488 [39] 


2.085 


-0.274 


0.218 


s5378 [39] 


1.467 


-0.137 


0.151 


s35932 [39] 


1.683 


-0.074 


0.088 


Trust 








WikiVote [40] 


14.573 


0.318 


-0.083 


College dUEE] 


3 


0.053 


-0.159 


Prison [4T1I12] 


2.716 


0.201 


0.129 


Regulatory 








TRN- Yeast- 1 p] 


2.899 


0.025 


-0.173 


TRN-Yeast-2 [B] 


1.568 


-0.236 


-0.220 


TRN-EC jB] 


1.239 


-0.082 


0.085 


Metabolic 








C. elegans [3] 


2.442 


0.924 


-0.174 


£. coli [3] 


2.533 


0.923 


-0.167 


5. Cerevisiae [3] 


2.537 


0.923 


0.182 



Table 1: The one- and two-point correlations of real networks along with their references. 
With the exception of the metabolic networks, most of them have small correlations. 



3.2 One-point correlations 

3.2.1 Numerical results 

In order to study the effect of the one-point correlation, we generated the in- and out- 
degree lists for the network with a given distribution and average degree. After randomiz- 
ing both lists, we fixed the out-degree list while we successively swapped randomly chosen 
elements in the in-degrees. After every swap, we calculated p(k in , k out ) and accepted the 
new list whenever the correlation increased (decreased). We measured the GRC when 
the difference between the correlation of the current state and the last measured state 
was larger than 0.01 (this is the resolution of the measured GRC(p) functionjl. The de- 
pendence of the GRC on the one-point correlation is shown in Fig. [131 It is clearly seen 
that the behavior of the GRC varies and depends strongly on the average degree. In the 
ER and exponential graphs, even the monotonity of the curves change. In the scale-free 
network, the GRC saturates at some level of correlation and the exponent effects only the 
threshold above which further correlations do not change the GRC. 

3.2.2 Analytical approach 

To have a qualitative insight of this behavior, let us consider 
distribution: 

Pjk = PjPk + P^rn jk 
Where the only criteria for the rrijk matrix are: 

oo oo 

m jk = m ik = (73) 

j=0 k=0 

oo 

Jkmjk = 1 (74) 

j,k=0 

With these conditions, the one-point degree-correlation of the above joint degree distri- 
bution is exactly p [35J. Note that we assumed the same distribution for the in- and 
out-degrees ((Tv(hn) — <^v(k out ) = a p ). The generating function of the modified distribu- 
tion: 

#oo (x, y) = g%o(x, y) + pxoo(x, y) (75) 

where the g^Q is the generating function of the original joint degree distribution and we 
introduced the following function: 

oo 

Xoo(x, y) = a 2 p ^ xl V km jk (76) 

j,k=0 



the following joint degree 

(72) 



2 This is the same protocol that is used in |%5] . 
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Figure 13: The GRC versus the one-point correlations for different average degrees in 
Erdos-Renyi and exponential graph and for different exponents in the scale-free network. 
Every point on the plots is the average of at least 200 independent simulations on networks 
with N = 10 4 nodes. In the case of the scale- free network, only a small negative degree- 
correlation is accessible through random optimization. 



Above the critical average degree, we have to solve 



u = 0oi («» 1) + "^y d yXoo (u, 1) (77) 

and the GRC is given by the modified generating function: 

GRC = g 00 (u, 1) - g 00 (u, u) (78) 

The exact solution of Eq. (1771) depends on the actual choice of rrijk and in most cases not 
possible to find in closed form. However, for small correlations (p 1), we can describe 
the change in the GRC at given average degree. In this case, the solution of Eq. (!77|) is 
very close to the solution of the original equation without the correlations: u = Uq + u p . 
Expanding both sides and keeping only the linear terms in p and u p , we get: 

(fc) 1 - d-^g^uo, 1) 

The coefficient (3(u) is negative above the transition point for the model networks we are 
interested in (for the proof, see Appendix B). In the linear approximation, the GRC looks 
like: 

GRC = GRC + GRC P (80) 

where 

GRCq = #oo( M o, 1) - ^ooOo, Mo) (81) 

and 

P (82) 

X=UQ 

In the derivation of GRC P , we used the observations that when the joint degree distri- 
bution factorizes, the generating function is also a product of the single distributions, 
i.e., 

0ooO&»2/) = 9o( x )9o(y) (83) 

and also 

9o («o) = J2 p i u o = 777\ ku ° p i k = 9ox( u o, 1) = «o (84) 

3=0 ^ ' j,k=0 

Now we have to interpret the result we obtained for the change in the GRC. First, for 
simplicity, let us choose the m jfc as proposed in [55] : 

m jk = te-«MP*-f*) (85) 
«*> - <*>*) 



GRC p 



0(u o )(l - 2u ) ■ ° -Xoo(«o,«o) 




(a) (b) (c) 

Figure 14: The largest reachable set below the transition point (a) and one of the nodes 
with large out-degree in the largest reachable set (A). Without degree-correlations, the 
expected number of in-degrees of A is the average degree and the maximum out-degree 
of its in-neighbors is also moderate (b). With degree-correlations, the nodes with large 
out-degree has large in-degree as well (c). And with increasing number of in-neighbors 
the expected maximum out-degree of its neighbors increases as well. 



where <j)j is an arbitrary normalized distribution and (k)^ is its average. Moreover, choose 
the <f>j distribution such that (k)^ = 1 which is the critical point of the ER and exponential 
networks. In this case Xoo( u o, u o) > and also note that 

/9(«o) < (86) 



dx 



X=UQ 



> (87) 



above the transition point. The behavior of GRC P is governed by the relation of the 
two terms in the right hand side of Eq. (1821) . It is easy to check that near the critical 
point (u « 1), the second term is very close to zero, and GRC P is dominated by the 
first term which is positive. This means that when the average degree is small, but the 
graph already has giant components of finite size, small one-point correlations increase its 
hierarchical structure, as in Fig. [13j For large average degrees, uq — > and both terms in 
Eq. f!82p becomes negative, resulting in a decreasing effect of small correlations, in good 
accordance with the numerical results (see Fig. [T3k and Fig. [13b). In the regime where 
the GRC has a maximum (uq = |), Eq. (1821 predicts a negative coefficient for p, but 
as Fig. [T3b shows, this is not the case for the exponential network. This discrepancy 
suggests that the change in the slope near this point is not trivial and strongly depends 
on the details of the addition of correlations. 

Below the transition point, there are no giant components and the GRC is dominated 
by the largest reachable set. We can assume that the node with the largest reachable 
set (node C on Fig. |T4"k ) or some of its out-neighbors have large out-degree, since this 



increases the probability of reaching more other nodes. Consider this neighbor with the 
large out-degree (node A in Fig. IT4r). In the case of negative or zero correlations, the 
expected maximum of the out-degree of its in-neighbors is small (Fig. [T4b). But in the 
presence of positive one-point correlations, the number of in- and out-edges tend to be 
similar, in other words, the node in question has large in-degree as well. In this case, the 
expected maximum of the out-degree of its in-neighbors (and thus of the candidate node 
for the largest local reaching centrality) increases and there are more reachable nodes. The 
scale-free network is a special case in the sense that the largest reachable set is with good 
approximation the neighborhood of a hub. Because of the correlations, this hub tends to 
have many in-neighbors as well and it is more likely to have a neighbor with many links 
(Fig. dUl). The numerical results in Fig. [T3b suggest that the maximum out-degree of the 
nearest neighbors reaches its saturation level at relatively small correlations. 



3.3 Two-point correlations 

The two-point correlation of the type we investigate in this paper is the generalization of 
the assortativity to directed networks [35]. In order to introduce this correlation in our 
calculations, let V(ji n ,j ou u k in , k out ) denote the probability that a randomly chosen edge 
starts at a node with j in in- and j out out-degree and points to a node with k in in- and k out 
out-degree. This probability satisfies the following equations: 

oo oo 
^ ^ '(j 'in: J out ) hm I out) ^ ^ P '(Jinylouti k{ n , k Q ut) Pli n l ou t (^^) 

jinijout — ki n ,k ou t — 

which also create its connection to the joint degree distribution. Without two-point 
correlations (and other correlations), the two- node joint distribution factorizes: 

P {jini jout i hni I out) PjmPjoutPk in Pk ou t (89) 

We can modify it by adding a correlation between Pj out and Pk in '- 

V r {j in , jout; k in , k out ) = Pj m Pk out [Pj out Pk in + r(r(p jout )(r(pk in )m jout k in ] (90) 
However, if we recover the joint degree distribution, the added term vanishes: 



Plinlout ^ ^ 'Prijim jouti hm lout) 

jinijout— : 

= Pi i t (91) 



because the sum of any row or column of the matrix m Jout jt jn must be zero (Eq. (|82|) ). This 
means that the assortativity defined by Eq. (ITT]) does not affect the GRC directly (via the 
joint degree distribution). Fig. [TS] depicts the numerical results that were produced by 
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Figure 15: The effect of two-point correlations to the GRC for the random graph models. 
Every point is an average at least 100 independent measure on networks with iV = 1000. 



the same protocol as in the previous subsection. In the case of the ER and exponential 
networks, a small decrease in the GRC can be observed, regardless of the average degree. 
The SF network has a non-monotonous behavior, but for larger assort at ivity, a decreasing 
in the GRC can be seen. These results point out that, although two-point correlation does 
not affect the joint degree distribution directly, it changes the hierarchical structure of 
the network a little. This small effect can be understood if we look at the different 
typical structures around an edge (Fig. [T6l) . In the case of large two-point correlations, 
the nodes with large out-degree (those that are expected to have larger reachable set) 
have out-neighbors with also large in-degrees (Fig. [T6k). This means that the nodes 
they can directly reach are also reachable from many other nodes (that also have large 
out-degrees). This reduces the difference between the local reaching centralities among 
nodes with large Cr or, in other words, introduces higher cr(i) in the definition of GRC 
(see Eq. (121)). Positive two-point correlation also introduces bottlenecks in the network 
(Fig. dHb). These structures accumulate many nodes on the in-edges that has similar 
reachable set, thus also reducing the differences. On the other hand, if the graph has 
negative two-point correlations, nodes with large out-degree can reach their out-neighbors 
uniquely (Fig. [T6b). Nodes that are reachable from independent directions also emerge 
(Fig. IT6H) . These nodes have in-neighbors with small out-degree. Both effects decrease 
the ratio of overlapping reachable sets which results in a reduced similarity in the Cr 
values. 
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Figure 16: The illustration of the two limiting cases of two-point correlations r defined 
by Eq. f lTTf . The edges that are correlated are shown in red. In the case of r > 0, 
the reachable sets of hubs tend to be similar (a) and bottlenecks emerge (b), causing an 
increase in the overlap of reachable sets. In negatively correlated networks (r < 0), hubs 
can have their own reachable sets (c) and nodes with large in-degree tend to be reachable 
from independent parts of the network (d), decreasing the overlap between reachable sets. 



4 Discussion 



The global reaching centrality (GRC) is a measure constructed to provide a characteristic 
number for the flow hierarchy of a directed network. In this paper we investigated the be- 
havior of this measure and its dependence on the joint degree distribution. We have shown 
that the hierarchical structure of a random network strongly depends on the existence 
of the giant strongly connected component (GSCC) and other giant components in the 
network. In the regime of low average degree, the GRC is dominated by the node with the 
largest reachable set that can be approximated from the distribution of the local reaching 
centralities. This distribution is connected to the distribution of small out-components 
in a network and can be analytically determined using the generating function method 
generalized to directed networks. In the presence of the GSCC, the GRC can be well 
approximated by the difference in the sizes of the giant in-component (GIN) and the 
GSCC, and both can be calculated exactly for some network models with given degree 
distribution. Using the approximations for the GRC, we calculated its dependence on the 
average degree for different network models: a directed random tree, the Erdos-Renyi 
graph (ER), the exponential the scale-free (SF) network. The results show that in the 
sense of hierarchy, there is an optimal average degree at which the GRC has a maximum 
value. It is a result of the competition between the size of the largest reachable set (which 
can be found by nodes in the GIN) and the average size of reachable sets (which is dom- 
inated by the nodes in the GSCC). Near to the transition point, the size of the GIN is 
close to the size of the GSCC and the difference is increasing since the GIN grows faster 
in the beginning. In the limit of large average degree, the sizes of both giant components 
tend to become equal, since even the two set of nodes tend to be the same. 

We also investigated the dependence of the GRC on two types of degree-correlations: 
one-point correlation (in-degree and out-degree of a node) and two-point correlation (the 
out-degree of the source and the in-degree of the target of a directed edge). Numerical 
results show that the hierarchical structure changes systematically with the one-point 
correlation. This is in accordance with the fact that the GRC can be expressed with 
the joint degree distribution, and latter is directly affected by the one-point correlations. 
We have pointed out that in the two limiting case of the average degree, the qualitative 
effect of small correlations can be understand by the addition of a correlation term to 
the uncorrelated joint degree distribution. When the average degree is very low, small 
correlations have a positive effect on the GRC. This is not true for denser graph, in which 
correlations decrease the GRC. There is also a regime of the average degree in which a 
given level of correlations can maximize the hierarchy. 

Both numerical and analytical results suggest that two-point correlation (which is the 
generalization of the assortativity for directed networks) does not effect the GRC directly. 
However, a small negative effect can be observed. It can be understand by looking at the 
effect of the two-point correlation on the neighborhood of an edge. 

The results on the random network models have shown a deeper insight of the be- 



havior of the GRC but we have to keep in mind that they are only the first steps in the 
understanding of hierarchy. The main message of the results is that hierarchy is sensible 
to the edge density of a network and tends to emerge more likely in sparse networks. This 
is in good accordance with the observation that most of the real world networks are in 
the range of low density and many of them has an inherent hierarchical structure. The 
results also point out that correlations can have a large effect and can change the the 
hierarchical structures fundamentally (they can produce a finite magnitude of GRC even 
if the network would have an infinitesimal GRC otherwise). This is another indicator of 
the well-known fact that one has to take into account the presence of correlations when 
dealing with real networks. 
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A Out-components in directed graphs 

Using the definition of the generating function h (y), we can obtain the probabilities for 
the out-components: 
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Substituting the expression behind the derivation from Eq. ([9]), we get: 
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This equation can be transformed into an integral by the Cauchy formula: 

out 1 rdyg^h^y^dhi 
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(94) 



The contour goes around the origin and has an infinitesimal radius, ensuring that it does 
not enclose poles. In our calculations yo = 0. In the last equation we just changed 
variables. We have to note that when y — > 0, h\ also converges to zero. This can be easily 
seen from Eq. ( TTOl) . We can eliminate y from the argument using Eq. ( TTUl) : 



7T 



o at 



2m(s - 1) 



h{~ 1 

goi[l,hi]g 10 [l,hi] s - 



d/il 



dhi 



(95) 



2m{s - 1) / /i*- 
A second application of the Cauchy formula gives us the final form of the out-components: 
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B Proof of /3(u) < 

We show that the coefficient 
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that appears in Eq. f!79|) is negative for any (k) > k c . The numerator has the form of 



(97) 



d y Xoo(u, 1) = ffj ^ x J km jk 
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where the only criteria for are Eq. (I73p - (j74j) . They can be satisfied by the following 
choice [35] : 

(Pj - 4>j)(Pk- fa) 
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(99) 



here is an arbitrary normalized distribution and (k)^ is the corresponding average. 
With this form of mjk, the numerator looks like: 



d y xoo(u, l) = a. 



(k) ~ 



(100) 



here ^(it) and g$ (u) are the corresponding generating functions for the distributions pj 
and 4>j. Assume that (k) > (k)^ and fix the <pj distribution such that (k)^ = k c . It can 
be easily seen that for the ER and exponential networks, g${u) < g^u) for any u. 
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(k) 

Figure 17: The illustration of F((k)) for the proof of (3(u ) < 0. Its value at k c follows 
directly from the definition and its derivative is examined in the text. 



We only have to show that d^^iu, 1) < 1 above the transition point. For this to see, 
let us consider the function in question as a function of the average degree: 

F((k)) = d x g p M(k)),l] (101) 

and note that F(k c ) = 1. Furthermore, F(x) is a decreasing function of its argument: 
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The first term of the right hand side is positive for all < u < 1, because it is a sum 
of positive numbers. In contrast, the second term is negative in the ER and exponential 
networks, because the size of the giant components increase with the average degree. Thus 
F((k)) < 1 (see Fig. EH}. 
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